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Abstract 



In this article, a version of sequential mastery testing (i.e., classifying students 
as a master/non-master or to continue testing and administering another item or testlet) is 
studied where response behavior is modeled by a multidimensional item response theory (IKT) 
model. First, a general theoretical framework is outlined that is based on a combination of 
Bayesian sequential decision theory and multidimensional ERT. Then it is pointed out how 
multidimensional ERT-based sequential mastery testing can be generalized to adaptive dem- 
and testlet-selection rules, that is, to the case where the choice of the next item or testlet to be 
administered is optimized using the information from previous responses. Both compensatory 
and conjunctive loss structures are considered. Simulation studies are used to evaluate (1) the 
performance, in terms of average loss, of multidimensional IFT-based sequential mastery testing 
as a function of the number of items administered per testing stage, (2) the effects on average 
loss when turning the sequential procedure into an adaptive sequential procedure, (3) the impact 
on average loss when the multidimensional structure is ignored and a unidimensional IRT model 
is used in the decision procedure. 

Key words: adaptive testing, Bayesian sequential decision theory, mastery testing, 
item response theory, multidimensional item response theory. 
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Introduction 

In an adaptive mastery test (AMT), the decision is to classify a student as a master, a 
non-master, or to continue testing and administering another item or testlet (i.e., items within 
a batch that are strongly related). In the sequel, we will assume that another testlet rather than 
another item is presented in case of continuing testing. Adaptive mastery tests are designed with 
the goal of maximizing the probability of making correct classification decisions (i.e., declaring 
mastery or non-mastery) while at the same time minimizing test length (Lewis & Sheehan, 
1990). For instance, Lewis and Sheehan (1990) showed in a simulation study that average test 
lengths could be reduced by half without sacrificing classification accuracy. In AMT, both the 
stopping rule (i.e., termination criterion) and testlet selection mechanism are adaptive. In other 
words, test takers with a low and high level of ability are classified as non-master and master, 
respectively, whereas those with an intermediate level of ability are presented another testlet. 
Furthermore, student’s ability measured on a latent continuum is estimated after each response, 
and the next testlet is selected such that its difficulty matches student’s last ability estimate. 
Doing so, able students can avoid doing too many easy items and less able students can avoid 
being exposed to too many difficult items. An implicit assumption is that items have unequal 
difficulty implying that the probability to answer an item correctly is not equal for all items 
in the pool, that is, response behavior is modeled by an item response theory (IRT) model. In 
case the termination criterion is determined using Bayesian sequential decision theory (e.g., De 
Groot, 1970; Lehmann, 1986), Vos and Glas (2000) denote an AMT as an adaptive sequential 
mastery test (ASMT), which combines the strong points of both approaches. 

Three basic elements can be identified in Bayesian sequential decision theory. In 
addition to a measurement model relating the probability of a correct response to student’s 
(unknown) ability and a loss function evaluating the total costs and benefits for each possible 
combination of decision outcome and ability, cost of test administration (‘cost per observation’) 
must be explicitly specified in this approach. Doing so, maximum expected losses associated 

3 

with the non-mastery and mastery decisions can now be calculated straightforward at each stage 
of testing. As far as the maximum expected loss associated with continuing testing concerns, 
this quantity is determined by averaging the maximum expected losses associated with each of 
the possible future decision outcomes with weights equal to the probability of observing those 
outcomes (i.e., the posterior predictive distributions). Optimal rules (i.e., Bayesian sequential 
rules) are now obtained by minimizing the posterior expected losses associated with all possible 
decision rules at each stage of testing using techniques of dynamic programming (i.e., backward 



5 



Adaptive Sequential Mastery Testing - 3 

induction). Backward induction starts by considering the final stage of testing (where no option 
to continue testing is available) and then works backward to the first stage of testing. Decision 
rules are hereby prescriptions specifying for each possible response pattern what decision (i.e., 
declare master/non-mastery or to continue testing) has to be taken. The Bayes principle assumes 
that prior knowledge about student’s ability is available and can be characterized by a probability 
distribution called the prior. This prior probability represents our best prior beliefs concerning 
student’s ability, that is, before any testlet yet has been administered. 

The impact of DRT-based sequential mastery testing (SMT), that is, the next item to be 
administered is randomly selected within the Bayesian sequential decision-theoretic framework, 
and ASMT on average loss, proportion correct classification decisions, and proportion testlets 
given was investigated by Vos and Glas (2000) in a number of simulation studies using the 
1PL as well as the 3PL testlet model. Two different dependence structures of testlet responses 
were introduced for the 3PL testlet model. First, it was assumed that all item responses were 
independent, given student’s ability. Secondly, a hierarchical IKT model was used reflecting 
a greater similarity of responses to items within than between testlets. For the loss structure 
involved, a linear loss function was adopted implying that the distance between student’s 
ability and the cut- off point 9 C , which is determined in advance by the decision-maker on the 
underlying latent ability 6 using standard-setting techniques, is taken into account. 

The results of the simulation studies indicated that the average loss in the SMT and 
ASMT conditions decreased considerably compared to the fixed test condition, mainly due 
to a significant decrease of testlets administered. The number of correct decisions remained 
relatively stable. With the 3PL model, ASMT produced considerably better results than SMT, 
while with the 1PL model the results of ASMT were only slightly better. When testlet response 
behavior was simulated by a hierarchical IRT model with within-person ability variance, average 
loss increased. Ignoring the within-person variance in the decision procedure resulted in a 
further inflation of losses. Across studies, the minimal variance criterion (i.e., maximizing the 
expected reduction in the variance of the difference between the losses of the mastery and non- 
mastery decision) and selection of testlets with maximum information near the cut-off point 0 C 
produced the best results, but the difference with the maximum information at the EAP estimate 
of ability was very small. 

The purpose of this article is to study a version of ASMT where response behavior 
is modeled by a multidimensional 1PL testlet model. The loss structure involved will be 
considered for both conjunctive (i.e., minimal requirements for each ability) and compensatory 
(i.e., low performance on one ability can be compensated by high performance on another 
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ability) testing strategies. The article concludes with a simulation study that aims on the gain 
of an SMT over a fixed-length mastery test and, in turn, the gain of an ASMT over an SMT 
using a multidimensional 1PL testlet model. As in Vos and Glas (2000), gain will be defined in 
terms of average loss, the average number of testlets administered, and the percentage of correct 
classification decisions. 



Definition of the decision problem 

In the following, it will be assumed that the variable-length mastery problem consists of 
S (S > 1) stages labeled s = 1, ..., S and at each stage a testlet can be administered. This testlet 
consists of one or more items indexed with i and the observed item responses for a randomly 
sampled student will be denoted by a discrete random variable (/*, with realization u x . Let the 
vector of item responses u s be the response pattern to the s-th testlet. For s — 1, ..., S the 
decisions will be based on a statistic w s which is a function of the response patterns u s , that is, 
w s — f(ui, ..., u s ). In many cases, w s will be the response pattern u lt ..., u s itself. However, 
below it will become clear that some computations are only feasible if the information of the 
complete response pattern is aggregated. At each stage of testing s (s = 1, ..., S — 1) a decision 
rule d( w s ) can be defined as 

{ m mastery decision 

n non-mastery decision • (1) 

c testing is continued. • ' 

At the final stage of testing, stage S, only the two mastery classification decisions m and n 
are available. Mastery will be defined in terms of the latent proficiency continuum of the IRT 
model. 



Multidimensional IRT models 

Multidimensional IRT models are IRT models for response behavior where the 
responses depend on more than one latent ability. Multidimensional IRT models for 
dichotomously scored items were first presented by McDonald (1967) and Lord and Novick 
(1968). These authors use a normal ogive to describe the probability of a correct response. 
McDonald (1967,1997) developed an estimation procedure based on an expression for the 
association between pairs of items derived from a polynomial expansion of the normal ogive. 
The procedure is implemented in NOHARM (Normal-Ogive Harmonic Analysis Robust 
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Method, Fraser, 1988). An alternative approach using all information in the data, and therefore 
labeled ’’Full Information Factor Analysis” was developed by Bock, Gibbons, and Muraki 
(1988). This approach is a generalization of the marginal maximum likelihood (MML) and 
Bayes modal estimation procedures for unidimensional IKT models (see, Bock & Aitkin, 1981, 
Mislevy, 1986), and has been implemented in TESTFACT (Wilson, Wood, & Gibbons, 1991). 
A Bayesian estimation procedure using a Markov Chain Monte Carlo (MCMC) technique has 
been presented by Beguin and Glas (1998). 

A comparable model using a logistic rather than a normal-ogive representation has 
been studied by Andersen (1985), Glas (1992), Reckase (1985, 1997) and Ackerman (1996a 
and 1996b). In the present article, the logistic version of the model will be used. In the logistic 
version, the probability of a correct response is given by 



p(Ui 1 | B\) ..., @Qt Uil , • * * , Ci) Q“b(l Q) 



exp(Eq - k) 

1 + exp(E 9 0 -igBg - bi) ’ 



( 2 ) 



where 9i, ...,6q are ability parameters, a,i, a iQ factor-loadings, b t the item difficulty and c, 
the guessing parameter. The probability of a response pattern 



p(u | a, b, c, p., E) = J, ..., J p ( u | 0, a, b, c )g(B \ n, E )d0, (3) 

with p(u | 6, a, b, c) the probability of a response pattern given 0, which is derived from (2) 
using the assumption of local independence, and g(0 | /x, E) the Q-variate normal distribution. 

Compensatory and Conjunctive-Disjunctive Loss Functions 

In the framework of the analysis of dichotomous dominance data, Coombs and Kao 
(1955, also see Coombs, 1960) make an important distinction between conjunctive-disjunctive 
and compensatory multidimensional models. The IRT model discussed above is a compensatory 
model because in determining the probability the ability dimensions are weighted with the factor 
loadings. However, the distinction between conjunctive, disjunctive and compensatory relations 
between latent variables and manifest variables can also be applied to define a loss structure. 

Compensatory loss functions First an example of a compensatory loss structure will be given. 
Consider two dimensions. Let 0i,#2 and 0 lc and 9 2c denote test taker’s proficiency level and 
some pre-specified cut-off points in the latent space, respectively. Consider a line in the two- 
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dimensional proficiency space defined by Ai{9\ — 0\ c ) + A 2 (02 — 9 2c ) = 0. This line divides 
the latent space into two subspaces, persons with a proficiency in one subspace are masters, the 
persons in the other subspace are non-masters. The loss function for the master and non-master 
decision is given by 

L(m, 0i, 62 ) = Tnax{sC, sC + A\(9\ — $i c ) + A 2 (0 2 — $ 2 c)} (4) 

with A \ , A 2 < 0 and 

L{n, 9 i, 02 ) = max{sC, sC + B\{9\ — #i c ) + 5 2 (0 2 — 02c)}i (5) 

with Bi,B 2 > 0; C is the cost of delivering one testlet, sC is the cost of delivering s tests. To 
ensure that Bi( 0 i- 0 i c )+B 2 ( 02 -^ 2 c) = 0 defines the same line as Ai(0i-0i c )-M 2 (0 2 -0 2c ) = 
0, the additional constraint A 1 /A 2 = B 1 /B 2 is imposed. Notice that the loss structure is 
compensatory in the sense that a proficiency below a cut-off score on one dimension can be 
compensated by a proficiency above a cut-off score on the other dimension. 

In Q dimensions, the loss function becomes 

L(m, 0) = max{sC, sC + A '(6 — 0 C )} (6) 

and 

L(n,Q) = max{sC,sC + B'(9 — 0 C )}, (7) 

where A and B are vectors of weights with all elements negative and positive, respectively, 
and 6 and 6 C are the ability vector and a vector of cut-off points, respectively. An additional 
constraint is that A '(6 - Q c ) = 0 and B'(0 - 6 C ) = 0 define the same (Q — 1) -dimensional 
linear sub-space. 

Conjunctive loss functions In a conjunctive loss structure, a test taker is considered a master 
if the proficiency is above a cut-off point on all dimensions, and is considered a non-master 
if proficiency is below a cut-off point on any dimension. In two dimensions, this could be 
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translated into the following loss-function. Define 



L(m,Qi y 0 2 ) 



sC + A\{0\ — die) + ^2(^2 “ ^2 c) 

sC + ^2(^2 “ $2 c) + ^3(^1 “ $lc)($2 “ $2 c) 

sC7 + Ai(^i — 0\c) + ^4(^1 — 0\c){02 “ $2c) 

sC 



if < 0ic and 0 2 < 0 2c 
if 0i > 0ic and 0 2 < 0 2c 
if 0i < 0ic and 0 2 > 0 2c 
if 0i > 0ic and 0 2 > 0 2c 



and 



f sC + (01 - 0ic) Bl (02 - 02 c) Ba if 0 1 > 01c and 0 2 > 0 2c 
1 sC otherwise, 



(9) 



for Ai,A 2 ,A 3 , A 4 < 0 and B 1 ,B 2 > 0. Both loss functions are continuous, L(n,0i,0 2 ) is 
strictly positive and increasing on the space where L(m, 0i,0 2 ) is equal to sC, in the same 
manner, L(m, 0i, 0 2 ) is strictly positive and decreasing on the space where L(n, 0i, 0 2 ) is sC. 
Notice that L(m, 0i,0 2 ) = sC + >li(0i - 0i c ) on the line 0 2 = 0 2c , and L(m,0i,0 2 ) = 
_A 2 (02 — 02c) on the line 0i = 0i c - 

Coombs and Kao (1955) show that conjunctive and disjunctive models are isomorph 
and only one mathematical model needs to be developed for the analysis of the problem. In 
the present case it is easily verified that choosing (8) as the definition for L(n, 0 i,6 2 ),(,9) for 
the definition of L(m, 0i , 0 2 ) and setting Ai,A 2 , A%, A 4 > 0 and B 1} B 2 < 0 defines the loss 
structure for the disjunctive case. 

At stage s, the decision whether the respondent is a master or a non-master, or whether 
another testlet will be administered, is based on the expected losses of the three possible 
decisions given the observation w s . The expected losses of the first two decisions are computed 
as 



E(L(m, 6) | w s ) = J , ..., J L(m , 0)p{0 \ w s )d0 (10) 

and 

E{L(n , 0) | w s ) = J , ..., J L(n, 0)p(0 \ w s )d0, (11) 

where p(0 \ w s ) is the posterior density of 0 given w s . The expected loss of the third 
possible decision is computed as the expected risk of continuing testing. If the expected risk of 
continuing testing is smaller than the expected loss of a master or a non-master decision, testing 
will be continued. The expected risk of continuing testing is defined as follows. 
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Let {w s+ i | w s } be the range of w s+ i given w s . Then, for s = 1, ...,S - 1, the 
expected risk of continuing testing is defined as 



E(R( w,+i) | w a ) = R( y 's+i)p{v's+i. | w,), 

{w J+l |w»} 

where the so-called posterior predictive distribution p(w s+ i | w s ) is given by 
p(w s+ i I w s ) = J J p(w s+ i I 6)p(0 I W 3 )d0, 

and risk is inductively defined as 

R( w s+ i) = min{E{L(m,0) | w s+ i), 

E(L(n, 0) | w s+1 ), E(R(w s+2 ) | w s+ i)}. 

The risk associated with the last testlet is defined as 



( 12 ) 



(13) 



(14) 



R( w s ) = min{E(L(m, 0) | w s ), E(L(n , 0) | ws)}. (15) 

So, given an observation w s , the expected distribution of w s+ i, w s+ 2 , w s is generated and 
an inference about future decisions is made. Based on these inferences, the expected risk of 
continuation (12) is computed and compared with the expected losses of a mastery or non- 
mastery decision. If the risk of continuation is smaller than these two expected losses, testing 
is continued. If this is not the case the classification decision with the smallest expected loss is 
made. 

Notice that the definitions (12) through (15) imply a recursive definition of the 
expected risk of continuation. In practice, the computation of the expected risk of continuing 
testing can be done by backward induction as follows. First, the risk of the last testlet is 
computed for all possible values of ws. Then the posterior predictive distribution p(ws | ws_i) 
is computed using (13), followed by the expected risk E(R(ws) \ ws-i) defined in (12). This, 
in turn, can be used for computing the risk fi(ws_i) for all ws-i using (14), and this iterative 
process continues until s is reached and the decision can be made whether to administer testlet 
s + 1, or to decide on mastery or non-mastery. 
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The Compound Multidimensional Rasch model 



The theory presented thus far is applicable to the broad class of multidimensional IRT 
models defined above. The theory of adaptive sequential mastery testing will now be worked out 
in detail for a special case of the general model. In this so-called compound multidimensional 
Rasch model (Glas, 1992), it is assumed that the complete test, or, in the present case, the 
complete testlet, consists of Q sub-tests, where every sub-test relates to a specific ability Q q , 
q = 1 , Further, it is assumed that the ensemble of person parameters 9\, ..., Qq has a 

Q-variate normal distribution with a mean equal to zero and a covariance matrix S. 

Given ..., Qq, the probability of a response pattern ui, ..., uq is given by 



where b g = (6 lg , b q K q )' is a vector of item parameters, u' 9 b g is the inner product of u and t q , 
t q = u qi i s the sum score, and 



Notice that t q is the minimal sufficient statistic for Q q . Further, it is easily verified that P q o(Q q ) 
is the probability, given Q q , of a response pattern with all item responses equal to zero. The 
probability of observing t q given Q q is given by 




Q 





(17) 



t=l 



P(tq I Qq) = P ( U 9 I 9 l) 



{Ug|tg} 




{ u q|£g} 



— 7t q (b g ) eXp(tq9 q )Pqo{0q)i 



with (bg) an elementary symmetric function defined by 



(b 9 ) = ex P( -u 9 b <i) 



{ u «M 
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and where {u g | t q } stands for the set of all possible response patterns resulting in a sum score 

t q . 

Given 6 = (0 1} 6q), the probability of a response pattern u = (u lt uq) is given 
by 



Q 

p(u\0) = |Jexp(f 9 0Jexp(-u^b,)P,o(0 9 ) 

< 7=1 

= exp(t'0)exp(— u'b)Po(0), 

where b — (bi, bg) is a vector of item parameters, t = (tj, f<j) and 

Q 

Po(e) = Hp q0 (e q ). 

9=1 

The probability of observing t given 0 is given by 

p(t | 0) = r t (b)exp(t'0)P o (0) 

with T t (b) is a product of the elementary symmetric functions 7 t (b 9 ) for q = 1, Q. Below, 
T t (b) will be referred to as a compound elementary symmetric function. 

Usually the prior 0 is standard normal, so let g{9 | E) be the normal density with mean 
zero and covariance matrix £. Then 

. ,, Pi 1 1 0)9(0 I E) _ exp(t'0)P„(0)9(0 | E) 

P[ ‘ ' Pit) f / exp(t'0)Fo(»)9(0 I E)d0 ' 

Notice that T t (b) cancels from the nominator and denominator. 

Applying the general framework of the previous section to the Rasch model boils down 
to choosing the minimal sufficient statistics for 0, that is, the unweighted sum scores for the 
statistics w s . So let t sq be the score pattern on the 9-th sub-test for the 5-th occasion. Further, 
define r s as a Q-vector with elements r sq = Yld-i W ^ et P(& I O stand for the posterior 
density of proficiency given r s . Then the expected losses (10), (11) and the expected risk (12) 
can be written as E(L(m,0) \ r s ), E(L(n,0) | r s ) and E(R( r s+1 ) |.r s ). More specifically, the 
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expected risk is given by 



E(R( r s+ i) | r s ) = ^ P(^+ 1 I r s )R{r a+1 ), 



(18) 



r »+l| r 5 



where the summation is over all scores r s+i compatible with r s . Defining z s +i= r 3+ i— r s , the 
posterior predictive distribution (13) specializes to 



where T Zj+1 is a shorthand notation for a compound elementary symmetric function of the item 
parameters of occasions s+1 andPo(s+i)(0) is equal to (17) evaluated using the item parameters 
of test s+1. That is, P O (s+i)(0) is equal to the probability of a zero response pattern on test 
s+1, given 0. 



A simulation study was designed to investigate the following four research questions. 
(1) What is the performance, in terms of average loss, of multidimensional ERT-based sequential 
mastery testing as a function of the number of items administered per testing stage? (2) What 
are the effects on average loss when turning the sequential procedure into an adaptive sequential 
procedure? (3) How is average loss in the sequential procedure influenced when ignoring the 
multidimensional structure and using a unidimensional IRT model? And finally, (4) how does 
ignoring the multidimensional structure affect the adaptive sequential procedure in terms of 
average loss? 

Compensatory loss functions For all simulations pertaining to compensatory loss functions, a 
three-dimensional compound Rasch model was used. The parameters of the loss function were 
(Ai, A 2 , A 3 ) = (—1, —1, —1) and (Bi,B 2 ,Bs) = (1, 1, 1), while the cost of administering one 
item was set equal to 0.02. The cut-off point was set equal to 6 C = 0. 

In the studies, the following aspects were varied: 

• The correlation between the latent dimensions. The three-dimensional compound Rasch 
model was simulated in two conditions: a high-homogenity condition were the correlation 




/’•’/ r z *+i ex P( z ' s +i 0 )- p o(s+i)(^) p{Q I r s)dO, (19) 



Simulation studies 
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between all three dimensions was p = 0.80 and a low-homogenity condition were this cor- 
relation was p = 0.40. 

• The test administration design. In the test procedure 27 items could be delivered. These 
items could be delivered as a fixed test of 27 items, or in a sequential design with 3 stages 
with 9 items per stage, 9 stages of 3 items, and 27 stages of one item. 

• The test administration mode. Test administration could be either sequential or adaptive se- 
quential. For the sequential procedure, the item difficulties 6* were drawn from a standard 
normal distribution. Further, the items were evenly distributed over the three ability dimen- 
sions, that is, a third of the items loaded on the first dimension, a third on the second, and a 
third on the third dimension. Finally, also within a stage the items were evenly distributed 
over the three dimensions, with the exception of the one-item stages, were items alternately 
loaded on a dimension. The item parameters were redrawn in every replication. For the adap- 
tive sequential mode, a testlet bank was generated in such a way that it could be expected 
that it supported selection of testlets with differential optimal measurement properties. For 
the design of 27 stages of one items each, this was simply translated into drawing 375 item 
difficulties for each ability dimension from the standard normal distribution and choosing 
the optimal item via a selection criterium that will be outlined below. For the procedures 
with 3 and 9 stages, the following procedure was adopted: 



- define the grid {h} = {hi, hi, / 13 } = { h(i ), h(j), h(k)\i,j, k — 1 , ..., 5, h(n) = — 1.0 + 
0.5(n — 1)}. Notice that this grid has 5 3 , that is 125 points. 

- for each point h G {h}, draw 3 item difficulties from the multivariate normal distribution 
defined by jV(h,0.2I). Each item is assumed to load on a different dimension. This is 
repeated 3 times for each point h G {h}. For the procedure with 3 stages, the 9 items 
form one testlet, for the procedure with 9 stages, three testlets of 3 items are formed. 
In this manner the total number of items available for the three procedures (27, 9 and 3 
stages) remains constant, that is, equal to 1125. 

Also for the adaptive mode, the item difficulties were redrawn in every replication. 

The choice of a criterium for adaptive testlet selection in a multidimensional framework is more 
complicated that in a unidimensional framework. In the latter framework, Vos and Glas (2000) 
studied three selection criteria. The first two entailed the choice of the testlet with maximum 
information at the cut-off point and at the expected-a-posteriori estimate of ability, respectively. 
In the multi-dimensional framework, these two criteria are less plausible. In one dimension, 
both the running estimate of ability and the cut-off point are on the same continuum, and any 
test with high information between these two points will be informative for the decision that has 
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to be taken. In a multidimensional framework, the test taker’s ability is a point in Q-dimensional 
space and the boundary between masters and non-masters becomes a line in two dimensions, or 
a linear manifold in more than two dimensions. Therefore, in this case the relation between the 
position of the test taker in the support of the loss function and the optimal testlet will be much 
more complicated, and remains a point of further study. 

As an alternative, the third criterion studied by Vos and Glas (2000) will be used. This 
approach is motivated by the fact that one is primarily interested in minimizing possible losses 
due to misclassifications.. The sequential procedure is based on comparing L(m, 6) and L(n, 6) 
to come to a decision. If, for every possible follow-up testlet s + 1, the observation w s+1 
is available, a natural choice for the follow-up test is the testlet were the posterior variance 
of the difference between L(m,0) and L(n,0), say var(L(m,6 ) — L(n,0) | w s+1 ), was 
minimal. However, the observation w 5+1 is not yet available, so a prediction must be made of 
the likelihood of w s+1 . This likelihood is obtained via the predictive distribution p(w s+1 | w s ). 
So if {w s+1 |w s } is the set of all possible values w s+1 given w s , the criterion for selection of 
the next testlet becomes 



7; var(L(m,0 ) - L(n,6) | w s+1 )p(w s+1 | w s ), (20) 

{w,+i|w 3 } 

that is, a testlet is chosen such that the expected variance of the difference between the losses 
of the mastery and non-mastery decision is minimal. In the study on the unidimensional case 
by Vos and Glas (2000) the performance of the three selection criteria was comparable, with a 
slight advantage for the procedure based on maximum information at the cut-off point. 



Insert Thble 1 and 2 about here 



The results for the simulation studies for p = 0.80 and p = 0.40 are reported in 
Thble 1 and Table 2, respectively. The results shown are a result of 1000 replications. For 
every replication a true ability 6 was drawn from the standard normal distribution. At the 
end of every replication, loss was computed using the true ability value. In Table 1, it can 
be seen that the mean loss decreased with the number of items in a testlet. This decrease can 
be attributed to a decrease in the number of items given. The proportion of correct decisions 
did not decrease, in fact, it slightly increased. Finally, it can be seen that using an adaptive 
testlet selection procedure further decreased mean loss, but this decrease was far less important 
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than the decrease attributable to decrease of the testlet size. These findings are analogous to the 
findings of Vos and Glas (2000) for the unidimensional case. 

The results for the study in the condition with p = 0.40 are shown in Table 2. It can 
be seen that the results are analogous to the results in Table 1 , with the exception that all mean 
losses are systematically larger than in the condition where p = 0.80. This is explained by the 
fact that in the case of a homogeneous item pool, item responses are informative with respect to 
all ability dimensions, while in the heterogenous case, item responses are mainly informative 
with respect to the ability on which they load. 



Insert Table 3 and 4 about here 



In Table 3 and Table 4, the results are given for the conditions where the 
multidimensional ability structure is ignored in the computations supporting the sequential 
and adaptive sequential procedure. In this condition, response behavior was generated and 
the final mean losses were computed using the ‘true’ item and ‘true’ multidimensional ability 
parameters, while the computations supporting the sequential and adaptive sequential procedure 
were made using a standard unidimensional Rasch model with the ‘true’ item difficulties bi 
and unidimensional standard normally distributed ability parameters. One could view this 
unidimensional approximation of multidimensional response behavior as an approximation 
based on the assumption that the correlation between the latent abilities is equal to one, i.e., 
p = 1.0. Therefore, in the unidimensional case, the losses (6) and (7) were computed using 
= 0 2 = 0 3 = 9, where 9 has a standard normal prior, and 9 cl = 0 c2 = 9# = 6 C = 0. The 
results for the condition with p = 0.80 are shown in Table 3, the results for the condition with 
p = 0.40 are shown in Thble 4. 

It can be seen that, in general, the mean losses were higher than the analogous losses in 
Table 1 and 2, but the increase of the loss remained limited. Therefore, it must be concluded that 
the unidimensional approximation based on the assumption p = 1.0 worked quite well. Further, 
one might expect that the approximation in the case where p = 0.40 would be worse, but this 
expectation was not confirmed by the results. An important exception was the case of adaptive 
testlet selection with 27 testlets of one item each. In that case, the average loss for the adaptive 
sequential procedure became higher than the average loss in the non-adaptive sequential testlet 
selection procedure. So there the combination of a unidimensional approximation of ability with 
the circumstance that the testlets only loaded on one ability dimension resulted in a relatively 
poor performance. 
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Conjunctive loss functions For all simulations pertaining to conjunctive loss functions, a two- 
dimensional compound Rasch model was used. The parameters of the loss function Ai , ..., A 4 
were all equal to —0.5, and the paparmeters B x and B 2 wire both equal to 1. The cost of 
administering one item was set equal to 0.01 and the cut-off point was set equal to 0 C = 0. 

In the studies, the following aspects were varied: 

• The correlation between the latent dimensions: p = 0.80 and p = 0.40. 

• The test administration design. In the test procedure 32 items could be delivered. These 
items could be delivered as a fixed test of 32 items, or in a sequential design with 4 stages 
with 8 items per stage, 8 stages of 4 items, and 32 stages of one item. 

• The test administration mode: sequential or adaptive sequential. For the sequential pro- 
cedure, the item difficulties were drawn from a standard normal distribution. Further, 
the items were evenly distributed over the two ability dimensions, that is, half of the items 
loaded on the first dimension and half loaded on the second dimension. Finally, also within 
a stage the items were evenly distributed over the two dimensions, with the exception of the 
one-item stages, were items alternately loaded on a dimension. The item parameters were 
redrawn in every replication. For the adaptive sequential mode, a testlet bank was generated 
in such a way that it could be expected that it supported selection of testlets with differential 
optimal measurement properties. For the design of 32 stages of one items each, this was 
simply translated into drawing 100 item difficulties for each ability dimension from the stan- 
dard normal distribution and choosing the optimal item via a selection criterium that will be 
outlined below. For the procedures with 32, 8 and 4 stages, the following procedure was 
adopted: 

- define the grid {h} = {hi, / 12 } = {h(i), h(j)\i,j = 1, ..., 5, h(n) = — 1.0 + 0.5(n— 1)}. 
Notice that this grid has 5 2 , that is 25 points. 

- for each point h G {h}, draw 2 item difficulties from the multivariate normal distribution 
defined by A/"(h,0.2I). Each item is assumed to load on a different dimension. This is 
repeated 4 times for each point h G {h}. For the procedure with 4 stages, the 8 items 
form one testlet, for the procedure with 8 stages, two testlets of 4 items are formed. In 
this manner the total number of items available for the two procedures (32, 8 and 4 stages) 
remains constant, that is, equal to 200. 

Also for the adaptive mode, the item difficulties were redrawn in every replication. 



Insert Table 5 to 8 about here 
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Contrary to the unidimensional approximation of the compensatory model, the 
unidimensional approximation does not work well for the conjunctive model. The reason for 
the poor performance for the unidimensional approximation of the two-dimensional conjunctive 
model is that there are many non-masters in the region {d\ > 0 \ c and 62 < # 2 c} and { 0 \ < 0 \ c 
and 02 > 02 c} that still obtain a sum score to make them eligible for a mastery decision in 
the unidimensional approximation through a compensatory process where a low ability and 
a low sum score on one dimension is compensated by a high ability and sum score on the 
other dimension. This can be seen in Table 9, where the proportion of wrongly identified non- 
masters is much higher than the proportion of non-masters wrongly identified. Notice that 
this proportion is negatively related to the correlation. In the compensatory model, the error- 
proportions are symmetric and approximately equal to 0.10. 



Insert Table 9 about here 



Conclusions and Further Research 

In this article, a general theoretical framework for non-adaptive and adaptive sequential 
testing based on a combination of Bayesian sequential decision theory and multidimensional 
IRT was presented. This framework was applied to the compound Rasch model. In this model 
it is assumed that the test items can be split up into a number of subsets related to specific ability 
dimensions and the relation between the dimensions is modeled by a covariance structure. Using 
this model, a number of simulation studies were performed which showed that augmentation 
of the number of stages in a sequential mastery procedure resulted in a marked decrease of 
average loss. Moving to adaptive sequential mastery testing further reduced average loss, 
but the effect was far less important than the effect of a non-adaptive sequential procedure. 
For the compensatory model, the results of the simulation studies showed that ignoring the 
multidimensional structure and using a unidimensional approximation to the multi-dimensional 
model did not generally result in an important increase in average losses. An exception was 
adaptive sequential testing with only one item per testlet and a low correlation of the ability 
dimensions. In that case, the average loss was higher that in the analogous case without adaptive 
item selection. For the conjunctive model, the unidimensional approximation was very poor. 

For application of the general framework for non-adaptive and adaptive sequential 
testing presented here to more general multidimensional IRT models, two important issues will 
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require further research. Firstly, the computation of the multiple integrals is done using Gauss- 
Hermite quadrature, which becomes very time-consuming when more than three dimensions 
are involved (see, for instance, Glas, 1992). Therefore, problems of higher dimensionality 
will need simulation methods for the evaluation of the multiple integrals. Secondly, many 
multidimensional IKT models, like, for instance, the ’’Full Information Factor Analysis” model 
by Bock, Gibbons, and Muraki (1988) have no sufficient statistics for 0 , and will need 
alternative choices for w s = /(u 1( ..., u s ). For the unidimensional 3PL model, Vos and Glas 
(2000) show that using unweighted sum scores results in a feasible procedure that produces 
acceptable results. A generalization to a multi -dimensional framework would probably be based 
on a Q-dimensional vector of partial sum scores, but this remains a point of further study. 
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Table 1 

Relation between selection method and loss 
compensatory model, p = 0.80 



Number 

of 

Testlets 


Items 

per 

Testlet 


Selection 

Method 


Proportion 

Correct 

Decisions 


Proportion 

Testlets 

Given 


Mean 

Loss 


1 


27 


Fixed Test 


0.81 


1.00 


0.7079 


3 


9 


Sequential 


0.79 


0.38 


0.4443 


3 


9 


Adaptive 


0.79 


0.27 


0.3777 


9 


3 


Sequential 


0.78 


0.25 


0.3972 


9 


3 


Adaptive 


0.78 


0.25 


0.3408 


27 


1 


Sequential 


0.79 


0.25 


0.3446 


27 


1 


Adaptive 


0.80 


0.22 


0.3060 



I 

i 
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Thble 2 

Relation between selection method and loss 
compensatory model, p = 0.40 



Number 

of 

Testlets 


Items 

per 

Testlet 


Selection 

Method 


Proportion 

Correct 

Decisions 


Proportion 

Testlets 

Given 


Mean 

Loss 


1 


27 


Fixed Test 


0.77 


1.00 


0.7654 


3 


9 


Sequential 


0.72 


0.38 


0.5387 


3 


9 


Adaptive 


0.73 


0.26 


0.4696 


9 


3 


Sequential 


0.73 


0.25 


0.4652 


9 


3 


Adaptive 


0.73 


0.22 


0.4169 


27 


1 


Sequential 


0.73 


0.22 


0.4109 


27 


1 


Adaptive 


0.73 


0.21 


0.3988 
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Table 3 

Relation between selection method and loss 
when multidimensionality is ignored 
compensatory model, p = 0.80 



Number 

of 

Testlets 


Items 

per 

Testlet 


Selection 

Method 


Proportion 

Correct 

Decisions 


Proportion 

Testlets 

Given 


Mean 

Loss 


1 


27 


Fixed Test 


0.81 


1.00 


0.6985 


3 


9 


Sequential 


0.81 


0.41 


0.4248 


3 


9 


Adaptive 


0.81 


0.43 


0.4138 


9 


3 


Sequential 


0.77 


0.28 


0.4074 


9 


3 


Adaptive 


0.80 


0.27 


0.3457 


27 


1 


Sequential 


0.80 


0.27 


0.3721 


27 


1 


Adaptive 


0.80 


0.24 


0.3295 
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Table 4 

Relation between selection method and loss 
when multidimensionality is ignored 
compensatory model, p = 0.40 ‘ 



Number 

of 

Testlets 


Items 

per 

Testlet 


Selection 

Method 


Proportion 

Correct 

Decisions 


Proportion 

Testlets 

Given 


Mean 

Loss 


1 


27 


Fixed Test 


0.76 


1.00 


0.8200 


3 


9 


Sequential 


0.73 


0.40 


0.5781 


3 


9 


Adaptive 


0.73 


0.43 


0.5017 


9 


3 


Sequential 


0.70 


0.29 


0.4838 


9 


3 


Adaptive 


0.75 


0.27 


0.4484 


27 


1 


Sequential 


0.76 


0.27' 


0.4023 


27 


1 


Adaptive 


0.71 


0.23 


0.4429 
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Table 5 

Relation between selection method and loss 
conjunctive model, p = 0.80 



Number 

of 

Testlets 


Items 

per 

Testlet 


Selection 

Method 


Proportion 

Correct 

Decisions 


Proportion 

Testlets 

Given 


Mean 

Loss 


1 


32 


Fixed Test 


0.85 


1.00 


0.3549 


4 


8 


Sequential 


0.82 


0.30 


0.1475 


4 


8 


Adaptive 


0.80 


0.26 


0.1396 


8 


4 


Sequential 


0.78 


0.22 


0.1306 


8 


4 


Adaptive 


0.80 


0.21 


0.1302 


32 


1 


Sequential 


0.79 


0.20 


0.1277 


32 


1 


Adaptive 


0.80 


0.20 


0.1270 



\ 



j] 
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Table 6 

Relation between selection method and loss 
conjunctive model, p = 0.40 



Number 

of 

Testlets 


Items 

per 

Testlet 


Selection 

Method 


Proportion 

Correct 

Decisions 


Proportion 

Tfestlets 

Given 


Mean 

Loss 


1 


32 


Fixed Test 


0.80 


1.00 


0.3999 


4 


8 


Sequential 


0.81 


0.30 


0.1765 


4 


8 


Adaptive 


0.81 


0.24 


0.1588 


8 


4 


Sequential 


0.81 


0.23 


0.1570 


8 


4 


Adaptive 


0.82 


0.20 


0.1377 


32 


1 


Sequential 


0.80 


0.19 


0.1375 


32 


1 


Adaptive 


0.81 


0.19 


0.1373 



i 

j. 

i 

i 



K 
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Table 7 

Relation between selection method and loss 
when multidimensionality is ignored 
conjunctive model, p = 0.80 



Number 

of 

Testlets 


Items 

per 

Testlet 


Selection 

Method 


Proportion 

Correct 

Decisions 


Proportion 

Testlets 

Given 


Mean 

Loss 


1 


32 


Fixed Test 


0.47 


1.00 


0.6208 


4 


8 


Sequential 


0.46 


0.30 


0.4581 


8 


4 


Sequential 


0.49 


0.24 


0.4247 


32 


1 


Sequential 


0.49 


0.28 


0.4340 



a 



29 



Adaptive Sequential Mastery Testing - 27 
Table 8 

Relation between selection method and loss 
when multidimensionality is ignored 

conjunctive model, p = 0.40 



Number 

of 

Testlets 


Items 

per 

Testlet 


Selection 

Method 


Proportion 

Correct 

Decisions 


Proportion 

Testlets 

Given 


Mean 

Loss 


1 


32 


Fixed Test 


0.41 


1.00 


0.6523 


4 


8 


Sequential 


0.43 


0.31 


0.5040 


8 


4 


Sequential 


0.41 


0.25 


0.5136 


32 


1 


Sequential 


0.44 


0.29 


0.4878 
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Thble 9 

Pattern of correct and incorrect decisions for 
unidimensional approximation of conjunctive model 



Correlation 

0.40 Decision 



State 


Mastery 


Non-mastery 


Total 


Mastery 


0.23 


0.08 


0.31 


Non-mastery 


0.51 


0.18 


0.69 


Total 


0.74 


0.26 


1.00 


0.80 


Decision 




State 


Mastery 


Non-mastery 


Total 


Mastery 


0.29 


0.10 


0.39 


Non-mastery 


0.43 


0.18 


0.61 


Total 


0.72 


0.28 


1.00 
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